agGraphSearch, knitr, SPARQL, EBImage
Last modified: 2021-05-14 13:29:20
Compiled: Fri May 14 13:29:54 2021
In the context of the semantic web, structured data refers to data that is written in a rule-based manner and has reference relationships between data. In other words, it is data that has been made more machine-readable by adding metadata to it. Typical examples of structured data or structured knowledge are ontologies and Linked Open Data such as wikidata and DBpedia. Structured data is often described by an RDF data model.
There is a need to understand the object/domain of interest through structured knowledge. However, the subject of interest is often described in terms of unstructured data, i.e., a vague range of data given as a list of text or vocabulary. Therefore, there is a huge gap in the mapping between these data.
Unless you are an expert in ontology and LOD, you will need to define your subject of interest clearly in the early stages and map it to the structured data. It is difficult to map them to structured data.
Therefore, to support the construction of the initial model of the structured data of the domain of interest In order to support this, a Based on a small lexical list of interest, a subset of We built a toolset for extracting a subset of the corresponding structured data To support the construction of an initial model of structured data for the domain of interest, we constructed a toolset to extract a subset of the corresponding structured data based on a small vocabulary list of interest.
Figure 1: Overview of the domain ontology construction
This tutorial will provide the procedure to obtain structured data from LOD as a real case study.
The agGraphSearch package is a tool-set to support the construction of domain ontology. This package provides a methodology for extracting target domain concepts from a large-scale public Linked Open Data (LOD) system. In the proposed method, the class-related hierarchy of the domain concept by the occurrences of common upper-level entities and the chain of those path relationships is obtained. The proposed method was described in Figure 1.
Figure 2: Overview of the upper-level concept graph and analysis algorithm
The numbers in the nodes indicate the number of search entities that exist in the subordinate concepts.
As an example of class hierarchy extraction from LOD, this short tutorial provides a workflow to obtain and visualize conceptual hierarchies related to leukemia from wikidata endpoint using its some entity labels.
Overview of the workflow of the proposed method was descrived in Figure 2.
Figure 3: Overview of the workflow of the proposed method
This result is similar to the network graph obtained with wikidata graph builder.
Once agGraphSearch is installed, it can be loaded by the following command.
#install
if(!require("agGraphSearch")){
install.packages( "devtools" )
devtools::install_github( "kumeS/agGraphSearch" )
}
#load
library("agGraphSearch")
#GitHub URL
#browseURL("https://github.com/kumeS/agGraphSearch")
Figure 4: Data model for the Wikidata class hierarchy
In this tutorial, the data model for class hierarchies in Wikidata will be mainly focused. It is shown in Figure 3. The class hierarchy of Wikidata is represented using the properties of subClassOf (wdt:P279) and instanceOf (wdt:P31) as a conceptual relationship between entities. In addition, the Wikidata entities are represented by IDs called QIDs. In this tutorial, in addition to QIDs, we used the property relations of representative name (rdfs:label) and alias (skos:altLabel), which represent links to label information of QIDs.
ter00 <- terms[1]
#check Query
CkeckQuery_agCount_Label_Num_Wikidata_P279_P31(Entity_Name = ter00)
## EndPoint:
## http://kozaki-lab.osakac.ac.jp/agraph/NEDO_pj
## Prefix:
## PREFIX wd: <http://www.wikidata.org/entity/>
## PREFIX wdt: <http://www.wikidata.org/prop/direct/>
## PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
## PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
## PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
## PREFIX owl: <http://www.w3.org/2002/07/owl#>
## PREFIX dct: <http://purl.org/dc/terms/>
## PREFIX foaf: <http://xmlns.com/foaf/0.1/>
## PREFIX wikibase: <http://wikiba.se/ontology#>
## ```````````````````````````````````````````
## ### 001 ###
## ```````````````````````````````````````````
## SELECT (count(distinct ?subject) as ?Count_As_Label)
## From <http://wikidata_nearly_full_201127>
## WHERE {
## ?subject rdfs:label "acute lymphocytic leukemia"@en.
## }
## ```````````````````````````````````````````
## ### 002 ###
## ```````````````````````````````````````````
## SELECT (count(distinct ?subject) as ?Count_As_AltLabel)
## From <http://wikidata_nearly_full_201127>
## WHERE {
## ?subject skos:altLabel "acute lymphocytic leukemia"@en.
## }
## ```````````````````````````````````````````
## ### 003 ###
## ```````````````````````````````````````````
## SELECT (count(distinct ?parentClass ) as ?Count_Of_ParentClass_Label)
## From <http://wikidata_nearly_full_201127>
## WHERE {
## ?subject rdfs:label "acute lymphocytic leukemia"@en.
## ?subject wdt:P279 ?parentClass.
## }
## ```````````````````````````````````````````
## SELECT (count(distinct ?parentClass ) as ?Count_Of_ParentClass_altLabel)
## From <http://wikidata_nearly_full_201127>
## WHERE {
## ?subject skos:altLabel "acute lymphocytic leukemia"@en.
## ?subject wdt:P279 ?parentClass.
## }
## ```````````````````````````````````````````
## ### 004 ###
## ```````````````````````````````````````````
## SELECT (count(distinct ?childClass ) as ?Count_Of_ChildClass_Label)
## From <http://wikidata_nearly_full_201127>
## WHERE {
## ?subject rdfs:label "acute lymphocytic leukemia"@en.
## ?childClass wdt:P279 ?subject.
## }
## ```````````````````````````````````````````
## SELECT (count(distinct ?childClass ) as ?Count_Of_ChildClass_altLabel)
## From <http://wikidata_nearly_full_201127>
## WHERE {
## ?subject skos:altLabel "acute lymphocytic leukemia"@en.
## ?childClass wdt:P279 ?subject.
## }
## ```````````````````````````````````````````
## ### 005 ###
## ```````````````````````````````````````````
## SELECT (count(distinct ?instance ) as ?Count_InstanceOf_Label)
## From <http://wikidata_nearly_full_201127>
## WHERE {
## ?subject rdfs:label "acute lymphocytic leukemia"@en.
## ?subject wdt:P31 ?instance.
## }
## ```````````````````````````````````````````
## SELECT (count(distinct ?instance ) as ?Count_InstanceOf_altLabel)
## From <http://wikidata_nearly_full_201127>
## WHERE {
## ?subject skos:altLabel "acute lymphocytic leukemia"@en.
## ?subject wdt:P31 ?instance.
## }
## ```````````````````````````````````````````
## ### 006 ###
## ```````````````````````````````````````````
## SELECT (count(distinct ?instance ) as ?Count_Has_Instance_Label)
## From <http://wikidata_nearly_full_201127>
## WHERE {
## ?subject rdfs:label "acute lymphocytic leukemia"@en.
## ?instance wdt:P31 ?subject.
## }
## ```````````````````````````````````````````
## SELECT (count(distinct ?instance ) as ?Count_Has_Instance_altLabel)
## From <http://wikidata_nearly_full_201127>
## WHERE {
## ?subject skos:altLabel "acute lymphocytic leukemia"@en.
## ?instance wdt:P31 ?subject.
## }
## ```````````````````````````````````````````
#Endpoint
agGraphSearch::KzLabEndPoint_Wikidata$EndPoint
#Graph id
agGraphSearch::KzLabEndPoint_Wikidata$FROM
#run SPARQL
#library(SPARQL)
res <- agCount_Label_Num_Wikidata_P279_P31(Entity_Name = ter00,
Dir="02_Short_Out")
res
#View table
#agTableDT(res, Width = "100px", Transpose = TRUE, AutoWidth=FALSE)
This program executes SPARQL with a for-loop.
Inputs are 3 terms.
#create an empty variable
m <- c()
#Run
for(n in 1:length(terms)){
#message(n)
m[[n]] <-agCount_Label_Num_Wikidata_P279_P31(Entity_Name = terms[n],
Dir="02_Short_Out")
}
#convert list to data.frame
(fm <- ListDF2DF(m))
## LABEL Hit_Label Hit_ALL Hit_upClass_All
## 1 acute lymphocytic leukemia 1 9 3
## 2 Chronic eosinophilic leukemia 1 3 2
## 3 philadelphia-positive myelogenous leukemia 1 1 1
## Hit_downClass_All Hit_subClassOf Hit_InstanceOf Hit_subClassOf_ParentClass
## 1 6 8 1 2
## 2 1 2 1 1
## 3 0 1 0 1
## Hit_subClassOf_ChildClass Hit_InstanceOf_ParentClass
## 1 6 1
## 2 1 1
## 3 0 0
## Hit_InstanceOf_ChildClass Count_Of_Label Count_Of_AltLabel
## 1 0 1 0
## 2 0 1 0
## 3 0 1 0
## Count_Of_subClassOf_ParentClass_Label
## 1 2
## 2 1
## 3 1
## Count_Of_subClassOf_ParentClass_altLabel Count_Of_subClassOf_ChildClass_Label
## 1 0 6
## 2 0 1
## 3 0 0
## Count_Of_subClassOf_ChildClass_altLabel Count_Of_InstanceOf_ParentClass_Label
## 1 0 1
## 2 0 1
## 3 0 0
## Count_Of_InstanceOf_ParentClass_altLabel Count_Of_InstanceOf_ChildClass_Label
## 1 0 0
## 2 0 0
## 3 0 0
## Count_Of_InstanceOf_ChildClass_altLabel
## 1 0
## 2 0
## 3 0
#View the data
#agTableDT(fm, Width = "100px", Transpose = TRUE, AutoWidth=FALSE)
fm1 <- fm[c(fm$Hit_Label > 0),]
fm2 <- fm1[c(fm1$Hit_ALL > 0),]
#dim(fm); dim(fm1); dim(fm2)
Lab01 <- fm2$LABEL
#Check Query
CkeckQuery_agWD_Alt_Wikidata(Lab01[1])
## EndPoint:
## http://kozaki-lab.osakac.ac.jp/agraph/NEDO_pj
## Prefix:
## PREFIX wd: <http://www.wikidata.org/entity/>
## PREFIX wdt: <http://www.wikidata.org/prop/direct/>
## PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
## PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
## PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
## PREFIX owl: <http://www.w3.org/2002/07/owl#>
## PREFIX dct: <http://purl.org/dc/terms/>
## PREFIX foaf: <http://xmlns.com/foaf/0.1/>
## PREFIX wikibase: <http://wikiba.se/ontology#>
## ```````````````````````````````````````````
## SELECT distinct ?subject
## From <http://wikidata_nearly_full_201127>
## WHERE {
## optional{ ?subject rdfs:label "acute lymphocytic leukemia"@en. }
## optional{ ?subject skos:altLabel "acute lymphocytic leukemia"@en. }
## }
## ```````````````````````````````````````````
#View query
CkeckQuery_agCount_ID_Num_Wikidata_QID_P279_P31(QID[1])
## EndPoint:
## http://kozaki-lab.osakac.ac.jp/agraph/NEDO_pj
## Prefix:
## PREFIX wd: <http://www.wikidata.org/entity/>
## PREFIX wdt: <http://www.wikidata.org/prop/direct/>
## PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
## PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
## PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
## PREFIX owl: <http://www.w3.org/2002/07/owl#>
## PREFIX dct: <http://purl.org/dc/terms/>
## PREFIX foaf: <http://xmlns.com/foaf/0.1/>
## PREFIX wikibase: <http://wikiba.se/ontology#>
## ```````````````````````````````````````````
## SELECT (count(distinct ?parentClass) as ?Count_Of_ParentClass)
## From <http://wikidata_nearly_full_201127>
## WHERE {
## wd:Q180664 wdt:P279 ?parentClass.
## }
## ```````````````````````````````````````````
## SELECT (count(distinct ?childClass) as ?Count_Of_ChildClass)
## From <http://wikidata_nearly_full_201127>
## WHERE {
## ?childClass wdt:P279 wd:Q180664.
## }
## ```````````````````````````````````````````
## SELECT (count(distinct ?instance) as ?Count_InstanceOf)
## From <http://wikidata_nearly_full_201127>
## WHERE {
## wd:Q180664 wdt:P31 ?instance.
## }
## ```````````````````````````````````````````
## SELECT (count(distinct ?instance) as ?Count_Has_Instance)
## From <http://wikidata_nearly_full_201127>
## WHERE {
## ?instance wdt:P31 wd:Q180664.
## }
## ```````````````````````````````````````````
#create an empty variable
QID_res <- c()
#Try SPARQL with QID
for(n in 1:length(Lab01)){
QID_res[[n]] <- agCount_ID_Num_Wikidata_QID_P279_P31(QID[n])
}
#convert list to data frame
QID_res2 <- ListDF2DF(QID_res)
#check results
head(QID_res2)
dim(QID_res2)
colnames(QID_res2)
#All
table(QID_res2$Hit_All)
table(QID_res2$Hit_All > 0)
table(QID_res2$Hit_All_Parent > 0)
table(QID_res2$Hit_All_Child > 0)
#View the results
#agTableDT(QID_res2, Width = "100px", Transpose = TRUE, AutoWidth=FALSE)
This step search for neighboring entities and properties, and then count their presence or absence. If the particular entity exists in the neighbor, the search entity is excluded. It is shown in Figure 4.
Ex. examples of neighboring entities - Family name (wd:Q101352) - movie (wd:Q11424)
Ex. examples of neighboring properties - sex or gender (wdt:P21) - located in the administrative territorial entity (wdt:P131)
Figure 5: Exclusion of non-applicable entities by relationships with the adjacent entity and the property
#For neighboring entities
#Check query
CkeckQuery_agCount_ID_Prop_Obj_Wikidata_vP( Entity_ID=QID[1], Object="wd:Q101352" )
## EndPoint:
## http://kozaki-lab.osakac.ac.jp/agraph/NEDO_pj
## Prefix:
## PREFIX wd: <http://www.wikidata.org/entity/>
## PREFIX wdt: <http://www.wikidata.org/prop/direct/>
## PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
## PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
## PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
## PREFIX owl: <http://www.w3.org/2002/07/owl#>
## PREFIX dct: <http://purl.org/dc/terms/>
## PREFIX foaf: <http://xmlns.com/foaf/0.1/>
## PREFIX wikibase: <http://wikiba.se/ontology#>
## ```````````````````````````````````````````
## SELECT (count(distinct ?p) as ?Count)
## From <http://wikidata_nearly_full_201127>
## WHERE {
## wd:Q180664 ?p wd:Q101352.
## }
## ```````````````````````````````````````````
#create an exclusion QID list without "wd:"
ExcluQ <- c("Q101352", "Q11424")
NumQ <- length(ExcluQ)
QIDdf <- data.frame(QID=QID)
#run SPARQL
for(m in seq_len(NumQ)){
#print(ExcluQ[m])
res <- c()
for(n in seq_len(length(QID))){
res[[n]] <- agCount_ID_Prop_Obj_Wikidata_vP(Entity_ID=QID[n],
Object=paste0("wd:", ExcluQ[m]))
}
res1 <- ListDF2DF(res)
eval(parse(text=paste0("QIDdf$", ExcluQ[m], " <- c(as.numeric(unlist(res1)) > 0)")))
}
#View the result
agTableKB(QIDdf)
| QID | Q101352 | Q11424 |
|---|---|---|
| wd:Q180664 | FALSE | FALSE |
| wd:Q5113976 | FALSE | FALSE |
| wd:Q55790812 | FALSE | FALSE |
#For neighboring properties
#Check query
CkeckQuery_agCount_ID_Prop_Obj_Wikidata_vO( Entity_ID=QID[1], Property="wdt:P21")
## EndPoint:
## http://kozaki-lab.osakac.ac.jp/agraph/NEDO_pj
## Prefix:
## PREFIX wd: <http://www.wikidata.org/entity/>
## PREFIX wdt: <http://www.wikidata.org/prop/direct/>
## PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
## PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
## PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
## PREFIX owl: <http://www.w3.org/2002/07/owl#>
## PREFIX dct: <http://purl.org/dc/terms/>
## PREFIX foaf: <http://xmlns.com/foaf/0.1/>
## PREFIX wikibase: <http://wikiba.se/ontology#>
## ```````````````````````````````````````````
## SELECT (count(distinct ?o) as ?Count)
## From <http://wikidata_nearly_full_201127>
## WHERE {
## wd:Q180664 wdt:P21 ?o.
## }
## ```````````````````````````````````````````
#create an exclusion list without "wdt:"
ExcluP <- c("P21", "P131")
NumP <- length(ExcluP)
#run SPARQL
for(m in seq_len(NumP)){
print(ExcluP[m])
res <- c()
for(n in seq_len(length(QID))){
res[[n]] <- agCount_ID_Prop_Obj_Wikidata_vO(Entity_ID=QID[n],
Property=paste0("wdt:", ExcluP[m]))
}
res1 <- ListDF2DF(res)
eval(parse(text=paste0("QIDdf$", ExcluP[m], " <- c(as.numeric(unlist(res1)) > 0)")))
}
#view the result
agTableKB(QIDdf)
# instanceOf (wdt:P31)
CkeckQuery_agWD_ID_Prop_Obj_Wikidata_vO(Entity_ID=QID[n], Property="wdt:P31")
## EndPoint:
## http://kozaki-lab.osakac.ac.jp/agraph/NEDO_pj
## Prefix:
## PREFIX wd: <http://www.wikidata.org/entity/>
## PREFIX wdt: <http://www.wikidata.org/prop/direct/>
## PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
## PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
## PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
## PREFIX owl: <http://www.w3.org/2002/07/owl#>
## PREFIX dct: <http://purl.org/dc/terms/>
## PREFIX foaf: <http://xmlns.com/foaf/0.1/>
## PREFIX wikibase: <http://wikiba.se/ontology#>
## ```````````````````````````````````````````
## SELECT distinct ?o ?oLabelj ?oLabele
## From <http://wikidata_nearly_full_201127>
## WHERE {
## wd:Q55790812 wdt:P31 ?o .
## ?o rdfs:label ?oLabelj . filter(LANG(?oLabelj) = "ja").
## ?o rdfs:label ?oLabele . filter(LANG(?oLabele) = "en").
## }
## ```````````````````````````````````````````
#create an empty variable
res3 <- c()
#run SPARQL
for(n in seq_len(length(QID))){
res3[[n]] <- agWD_ID_Prop_Obj_Wikidata_vO(Entity_ID=QID[n], Property="wdt:P31")
}
# subClassOf (wdt:P279)
CkeckQuery_agWD_ID_Prop_Obj_Wikidata_vO(Entity_ID=QID[n], Property="wdt:P279")
## EndPoint:
## http://kozaki-lab.osakac.ac.jp/agraph/NEDO_pj
## Prefix:
## PREFIX wd: <http://www.wikidata.org/entity/>
## PREFIX wdt: <http://www.wikidata.org/prop/direct/>
## PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
## PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
## PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
## PREFIX owl: <http://www.w3.org/2002/07/owl#>
## PREFIX dct: <http://purl.org/dc/terms/>
## PREFIX foaf: <http://xmlns.com/foaf/0.1/>
## PREFIX wikibase: <http://wikiba.se/ontology#>
## ```````````````````````````````````````````
## SELECT distinct ?o ?oLabelj ?oLabele
## From <http://wikidata_nearly_full_201127>
## WHERE {
## wd:Q55790812 wdt:P279 ?o .
## ?o rdfs:label ?oLabelj . filter(LANG(?oLabelj) = "ja").
## ?o rdfs:label ?oLabele . filter(LANG(?oLabele) = "en").
## }
## ```````````````````````````````````````````
#create an empty variable
res4 <- c()
#run SPARQL
for(n in seq_len(length(QID))){
res4[[n]] <- agWD_ID_Prop_Obj_Wikidata_vO(Entity_ID=QID[n], Property="wdt:P279")
}
#convert list to data.frame
res3b <- ListDF2DF(res3)
res4b <- ListDF2DF(res4)
res <- rbind(res3b, res4b)
#remove rows with NA on "o" col
(res.na <- res[!is.na(res$o),])
#View the result
#agTableDT(res.na, Width = "100px", Transpose = FALSE, AutoWidth=FALSE)
#create a new folder
if(!dir.exists("03_Short_Out")){dir.create("03_Short_Out")}
#create an empty variable
res5 <- c()
#run SPARQL; search the upper-level classes
for(n in 1:length(QID)){
message(n)
res5[[n]] <- PropertyPath_GraphUp_Wikidata(Entity_ID = QID[n],
Depth = 30)
}
#check results
head(res5[[1]])
agTableDT(res5[[1]])
#Count rows
checkNrow_af(res5)
#Detect loop
checkLoop_af(res5)
#Save
saveRDS(res5,
file="./03_Short_Out/Individual_upGraph.Rdata",
compress = TRUE)
An alternative way,
#run SPARQL with purrr::map function
res5m <- purrr::map(QID,
PropertyPath_GraphUp_Wikidata,
Depth = 30)
#check results
#Count rows
checkNrow_af(res5m)
#Detect loop
checkLoop_af(res5m)
#create a new folder
if(!dir.exists("03_Short_Out_vis")){dir.create("03_Short_Out_vis")}
#create networks
for(n in 1:length(res5)){
#n <- 1
a <- agIDtoLabel_Wikidata(Entity_ID = QID[n])
if(is.na(a[,2])){a[,2] <- a[,3]}
Lab00 <- paste(a[,c(2, 1)], collapse = ".")
FileName <- paste0("agVisNetwork_", Lab00,"_", format(Sys.time(), "%y%m%d"),".html")
#run the network creation
agVisNetwork(Graph=res5[[n]],
Selected=Lab00,
Browse=FALSE,
Output=TRUE,
FilePath=FileName)
Sys.sleep(1)
filesstrings::file.move(files=FileName,
destinations="./03_Short_Out_vis",
overwrite = TRUE)
Name <- paste0("./agVisNetwork_",
formatC(n, flag="0", width=4),
"_", Lab00, "_files")
if(dir.exists(Name)){file.remove(Name)}
}
#View the results
#browseURL(paste0("./03_Short_Out_vis/", dir("03_Short_Out_vis", pattern=".html")[1]))
#browseURL(paste0("./03_Short_Out_vis/", dir("03_Short_Out_vis", pattern=".html")[2]))
#browseURL(paste0("./03_Short_Out_vis/", dir("03_Short_Out_vis", pattern=".html")[3]))
#Merge their graphs to one graph
res6 <- ListDF2DF(res5)
#check NAs
table(is.na(res6))
#Delete deplicates
res6d <- Exclude_Graph_duplicates(input=res6)
#check dim
dim(res6); dim(res6d)
#Save
saveRDS(res6d,
file="./03_Short_Out/Merged_upGraph.Rdata",
compress = TRUE)
#run the network creation
if(TRUE){
FileName <- paste0("agVisNetwork_Merged", "_",
format(Sys.time(), "%y%m%d"),".html")
agVisNetwork(Graph=res6d,
Browse=FALSE,
Output=TRUE,
FilePath=FileName)
filesstrings::file.move(files=FileName,
destinations="./03_Short_Out_vis",
overwrite = TRUE)
}
#View the results
#browseURL(paste0("./03_Short_Out_vis/", FileName))
Figure 6: Merged network diagrams for search terms related to leukemia
The common upper-level concept is defined based on the edge list of triples obtained above.
##Graph data without the uplicates
#Number of entities
(E01 <- length(unique(c(res6d$subject, res6d$parentClass))))
#Number of labels
(E02 <- length(unique(c(res6d$subjectLabel, res6d$parentClassLabel))))
#Number of Triples
(E03 <- length(unique(res6d$triples)))
#Gathering the parent concepts
upEntity <- unlist(purrr::map(res5, function(x){unique(x$parentClass)}))
#calculate the frequency of common entities
Count_upEntity_DF <- countCommonEntities(upEntity)
#Count and view table
agTableDT(Count_upEntity_DF, Transpose = F, AutoWidth = FALSE)
#Count Freq
table(Count_upEntity_DF$Freq)
#extarct parentClass & parentClassLabel from the merged dataset
Dat <- data.frame(res6d[,c(colnames(res6d) == "parentClass" |
colnames(res6d) == "parentClassLabel")],
stringsAsFactors = F)
head(Dat)
#Delete the deplicates
Dat0 <- Exclude_duplicates(Dat, 1)
head(Dat0)
dim(Dat); dim(Dat0)
#define the common upper-level entities
dim(Count_upEntity_DF); dim(Dat0)
head(Count_upEntity_DF); head(Dat0)
Count_upEntity_DF2 <- Cutoff_FreqNum(input1=Count_upEntity_DF,
input2=Dat0,
By="parentClass",
Sort="Freq",
FreqNum=2)
#check the results
head(Count_upEntity_DF2, n=10)
table(Count_upEntity_DF2$Freq)
#save
saveRDS(Count_upEntity_DF2,
file = "./03_Short_Out/Count_upEntity_DF2.Rdata", compress = TRUE)
readr::write_excel_csv(Count_upEntity_DF2,
file="./03_Short_Out/Count_upEntity_DF2.csv")
#Count_upEntity_DF2 <- readRDS(file = "./03_Short_Out/Count_upEntity_DF2.Rdata")
#Calculation of inclusion rate
QID <- QIDdf$QID
##QID
qid <- unique(res6d$subject, res6d$parentClass)
b <- setdiff(QID, qid)
b; length(b)
##rdfsLabel
#RdfsLabel <- unique(res6d$subjectLabel, res6d$parentClassLabel)
FileName <- paste0("./FrequencyGraph_", format(Sys.time(), "%y%m%d_%H%M"),".html")
pc_plot(Count_upEntity_DF2,
SaveFolder="03_Short_Out_vis",
FileName=FileName,
IDnum=3)
#View the results
#browseURL(paste0("./03_Short_Out_vis/", dir("03_Short_Out_vis", pattern="FrequencyGraph_")[2]))
#browseURL(paste0("./03_Short_Out_vis/", dir("03_Short_Out_vis", pattern="FrequencyGraph_")[1]))
#Individual graphes
eachGraph <- readRDS("./03_Short_Out/Individual_upGraph.Rdata")
head(eachGraph[[1]])
sapply(eachGraph, dim)
#Search entities
(list1a <- readRDS("./02_Short_Out/SearchEntities.Rdata"))
head(list1a)
any(list1a == "wd:Q35120")
#Common entities
list2a <- readRDS("./03_Short_Out/Count_upEntity_DF2.Rdata")
head(list2a)
dim(list2a)
list2b <- unique(list2a$parentClass)
head(list2b)
any(list2b == "wd:Q35120")
#Remove Q35120 from the common list.
list2b <- list2b[list2b != "wd:Q35120"]
#Inclusion of list1a and list2b
table(list1a %in% list2b)
table(list2b %in% list1a)
system.time(
SearchNum <- agGraphAnalysis(eachGraph,
list1a,
list2b,
LowerSearch=TRUE)
)
head(SearchNum)
table(SearchNum$Levels)
sum(table(SearchNum$Levels))
table(SearchNum$Levels)
table(!is.na(SearchNum[,2]))
## R version 4.0.2 (2020-06-22)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Catalina 10.15.7
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
##
## locale:
## [1] ja_JP.UTF-8/ja_JP.UTF-8/ja_JP.UTF-8/C/ja_JP.UTF-8/ja_JP.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] agGraphSearch_0.99.1 SPARQL_1.16 RCurl_1.98-1.3
## [4] XML_3.99-0.6 EBImage_4.32.0 BiocStyle_2.18.1
##
## loaded via a namespace (and not attached):
## [1] locfit_1.5-9.4 lattice_0.20-41 tidyr_1.1.3
## [4] visNetwork_2.0.9 fftwtools_0.9-11 png_0.1-7
## [7] assertthat_0.2.1 digest_0.6.27 utf8_1.2.1
## [10] R6_2.5.0 tiff_0.1-8 filesstrings_3.2.2
## [13] evaluate_0.14 httr_1.4.2 ggplot2_3.3.3
## [16] highr_0.9 pillar_1.6.0 rlang_0.4.10
## [19] lazyeval_0.2.2 data.table_1.14.0 jquerylib_0.1.4
## [22] DT_0.18 rmarkdown_2.7 readr_1.4.0
## [25] stringr_1.4.0 htmlwidgets_1.5.3 franc_1.1.3
## [28] igraph_1.2.6 munsell_0.5.0 compiler_4.0.2
## [31] xfun_0.22 pkgconfig_2.0.3 BiocGenerics_0.36.1
## [34] htmltools_0.5.1.1 tidyselect_1.1.0 tibble_3.1.1
## [37] bookdown_0.22 viridisLite_0.4.0 fansi_0.4.2
## [40] crayon_1.4.1 dplyr_1.0.5 bitops_1.0-7
## [43] grid_4.0.2 jsonlite_1.7.2 formattable_0.2.1
## [46] gtable_0.3.0 lifecycle_1.0.0 DBI_1.1.1
## [49] magrittr_2.0.1 scales_1.1.1 stringi_1.5.3
## [52] bslib_0.2.4 ellipsis_0.3.2 vctrs_0.3.8
## [55] generics_0.1.0 tools_4.0.2 glue_1.4.2
## [58] purrr_0.3.4 hms_1.0.0 jpeg_0.1-8.1
## [61] networkD3_0.4 abind_1.4-5 parallel_4.0.2
## [64] yaml_2.2.1 colorspace_2.0-0 BiocManager_1.30.12
## [67] strex_1.4.2 plotly_4.9.3 knitr_1.33
## [70] sass_0.3.1